Marvin Martin & Aflak Michel Omar (ING5 BDA Gr01A)
26/11/2020
Airbnb daily data is very valuable. An investor is eager to use this data to make key decisions about the best real estate option available to generate benefit. In this project, we will use scrapped data from 6 countries aggregated over a period of time (For each city of these countries we kept the 3 latest dates collected):
These datasets can be downloaded on this website http://insideairbnb.com/get-the-data.html. A csv document is available in data\all_data_urls.csv where all the scrapped urls are available and ready to download.
Because these datasets are huge, we made some processing to focus on important information and at the same time use a reasonable amount of data (fit computation and time limitations). We’ve been through several steps:
################### Code From utils/tools.R ##################################
urls <- read.csv(file.path("./data/all_data_urls.csv")) # Step 1
df <- extract_all_meta(urls) # Step 2
lastest_dates <- 3 # Step 3
countries <- c("france", "spain", "the-netherlands", "germany", "belgium","italy") # Step 4
download_data(df, countries, lastest_dates) # Step 5We reduce the data size from several Gb to only a hundred of Mb. We are now ready to play with it!
Starting with raw data, we’ve been through several steps:
[Step 1] Load csv data with urls and meta provided (read.csv)
[Step 2] Extract “country”, “region”, “city”, “date” and “url” from the csv in a dataframe (extract_all_meta)
[Step 3] Specify the number “n” of latest scrapping dates you are looking for.
[Step 4] Select a list of 6 countries, on which you want to work on.
[Step 5] Go through this dataframe, line by line and do the following steps (download_data and prepare_data) :
/data/countries/listings_CITY_NAME_date.csv.[Step 5 - Remarque] This step results in a big csv file for every cities of the countries listed.
We could have written the csv’s into files, but since this step takes more than 10 minutes, we preferred to keep them in memory.
[Step 6] Get Final preprocessed dataset by merging all the cities csv into a single data frame (load_global_listings).
This step is performed when the server starts and takes around 20 seconds.
Here is the shape of our dataset:
# Publications :1321825
# Features :21
Feature names are:
## - id
## - country
## - region
## - city
## - date
## - neighbourhood_cleansed
## - latitude
## - longitude
## - property_type
## - room_type
## - accommodates
## - bedrooms
## - beds
## - price
## - minimum_nights
## - maximum_nights
## - review_scores_rating
## - availability_30
## - price_30
## - revenue_30
## - latitudelongitude
Tab 1 - Analysis by comparing several cities
Tab 2 - Analysis only one city
We used several libraries (webapp, graphical, data manipulation) to build this project:
shiny, googleVis, ggplot2, dplyr, data.table, stringr and glue
################### Code From shinyApp/ui.R ##################################
# IT IS SPEUDO CODE !!!
fluidPage
tabsetPanel
tabPanel # Analysis 1 Tab
sidebarLayout
sidebarPanel # Tool Bar
Checkbox, selectInput, uiOutput, ...
mainPanel # Plots
htmlOutput, plotOutput ...
tabPanel # Analysis 2 Tab
sidebarLayout
sidebarPanel # Tool Bar
Checkbox, selectInput, uiOutput, ...
mainPanel # Plots
htmlOutput, plotOutput ...################### Code From shinyApp/server.R ##################################
# IT IS SPEUDO CODE !!!
listings <- load_global_listings() # Download data
# Server
server
# Tab 1 variables
reactive # Reactive DataFrame (filter by country / cities / features)
renderUI # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
renderGvis, renderPlot # Plots send to ui send from server to htmlOutput,plotOutput (histogram,...)
# Tab 2 variables
reactive # Reactive DataFrame (filter one city)
renderUI # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
renderGvis, renderPlot # Plots send to ui from server to htmlOutput,plotOutput (map,...)Each tab is split into two vertical parts: Tool Bar and Plots
You can:
You can: